Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Mixed precision neural network quantization method based on Octave convolution
ZHANG Wenye, SHANG Fangxin, GUO Hao
Journal of Computer Applications    2021, 41 (5): 1299-1304.   DOI: 10.11772/j.issn.1001-9081.2020071106
Abstract311)      PDF (2485KB)(300)       Save
Deep neural networks with 32-bit weights require a lot of computing resources, making it difficult for large-scale deep neural networks to be deployed in limited computing power scenarios (such as edge computing). In order to solve this problem, a plug-and-play neural network quantification method was proposed to reduce the computational cost of large-scale neural networks and keep the model performance away from significant reduction. Firstly, the high-frequency and low-frequency components of the input feature map were separated based on Octave convolution. Secondly, the convolution kernels with different bits were respectively applied to the high- and low-frequency components for convolution operation. Thirdly, the high- and low-frequency convolution results were quantized to the corresponding bits by using different activation functions. Finally, the feature maps with different precisions were mixed to obtain the output of the layer. Experimental results verify the effectiveness of the proposed method on model compression. When the model was compressed to 1+8 bit(s), the proposed method had the accuracy dropped less than 3 percentage points on CIFAR-10/100 dataset; moreover, the proposed method made the ResNet50 structure based model compressed to 1+4 bit(s) with the accuracy higher than 70% on ImageNet dataset.
Reference | Related Articles | Metrics
APP component recognition method based on object detection
ZHANG Wenye
Journal of Computer Applications    DOI: 10.11772/j.issn.1001-9081.2019081420
Accepted: 30 October 2019